NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

https://doi.org/10.1088/2632-2153/ac9cb5

Ghielmetti, Nicolò; Loncar, Vladimir; Pierini, Maurizio; Roed, Marcel; Summers, Sioni; Aarrestad, Thea; Petersson, Christoffer; Linander, Hampus; Ngadiuba, Jennifer; Lin, Kelvin; et al (November 2022, Machine Learning: Science and Technology)

Abstract In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.
more » « less
Full Text Available
GPU coprocessors as a service for deep learning inference in high energy physics

https://doi.org/10.1088/2632-2153/abec21

Krupa, Jeffrey; Lin, Kelvin; Acosta Flechas, Maria; Dinsmore, Jack; Duarte, Javier; Harris, Philip; Hauck, Scott; Holzman, Burt; Hsu, Shih-Chieh; Klijnsma, Thomas; et al (April 2021, Machine Learning: Science and Technology)
null (Ed.)
Full Text Available
FPGAs-as-a-Service Toolkit (FaaST)

Rankin, Dylan; Krupa, Jeffrey; Harris, Philip; Flechas, Maria; Holzman, Burt; Klijnsma, Thomas; Pedro, Kevin; Tran, Nhan; Hauck, Scott; Hsu, Shih-Chieh; et al (October 2020, ArXivorg)
null (Ed.)
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
more » « less
Full Text Available
FPGAs-as-a-Service Toolkit (FaaST)

https://doi.org/10.1109/H2RC51942.2020.00010

Rankin, Dylan; Krupa, Jeffrey; Harris, Philip; Flechas, Maria Acosta; Holzman, Burt; Klijnsma, Thomas; Pedro, Kevin; Tran, Nhan; Hauck, Scott; Hsu, Shih-Chieh; et al (November 2020, 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC))
null (Ed.)
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
more » « less
Full Text Available

Search for: All records